For the National Walkability Index, we wanted to go beyond the EPA’s current Walkability Map and created the following interactive spatial visualization that demonstrates county-level walkability scores and prevalence of health outcomes. We utilized many geographic information systems (GIS) concepts and synthesized skills learned from both the Data Science I (BIST P8105) and Public Health GIS (EHSC P8371) courses that we’re taking in the creation of this spatial visualization.
Our health outcomes of interest are:
High Cholesterol: High cholesterol among adults aged
>=18 years who have been screened in the past 5 yearsPhysical Inactivity: No leisure-time physical activity
among adults aged >=18 yearsMental Health: Mental health not good for >=14 days
among adults aged >=18 yearsPhysical Health: Physical health not good for >=14
days among adults aged >=18 yearsHigh Blood Pressure: High blood pressure among adults
aged >=18 yearsAccording to the EPA Methodology and User Guide, National Walkability Index scores on a scale of 1 to 20 are categorized as follows:
| National Walkability Index | Level |
|---|---|
| 1 - 5.75 | Least walkable |
| 5.76 - 10.5 | Below average walkable |
| 10.51 - 15.25 | Above average walkable |
| 15.26 - 20 | Most walkable |
library(rjson)
library(leaflet)
library(tidyverse)
library(plotly)
# Read in cleaned datafile
walkability = read_csv("./data/merge_final.csv")
# Get the county-level .json file from the web
url = 'https://raw.githubusercontent.com/plotly/datasets/master/geojson-counties-fips.json'
counties = rjson::fromJSON(file = url)
# Set up the geographic projection for the map
g = list(scope = 'usa',
projection = list(type = 'albers usa'))
# Get the df for mean of walkability index by county
index_by_county = walkability %>%
mutate(fips_county = str_c(statefp, countyfp)) %>%
group_by(fips_county) %>%
summarise(ind_mean = round(mean(nat_walk_ind), 1))
# Get the df for mean of health outcome prevalence by county
hover_test_df = walkability %>%
mutate(fips_county = str_c(statefp, countyfp)) %>%
group_by(fips_county, measure_id) %>%
summarise(county_hlth = round(mean(data_value, na.rm = TRUE), 1)) %>%
pivot_wider(names_from = "measure_id", values_from = "county_hlth") %>%
janitor::clean_names()
# Merge the two df by county-level FIPS
walkability_map = full_join(x = index_by_county, y = hover_test_df, by = "fips_county")
# Adding variable for hovertext
walkability_map$hover <- with(walkability_map,
paste('<br>',
"National Walkability Index (by County):", ind_mean, '<br>',
"% Health Outcomes (by County)", '<br>',
"High Cholesterol:", highchol, '<br>',
"Physical Inactivity:", lpa, '<br>',
"Mental Health:", mhlth, '<br>',
"Physical Health:", phlth, '<br>',
"High Blood Pressure:", bphigh))
# Choropleth with hovertext of health outcomes by county by plotly
choropleth_map = plot_ly()
choropleth_map = choropleth_map %>%
add_trace(type = "choropleth",
geojson = counties,
locations = walkability_map$fips_county,
z = walkability_map$ind_mean,
text = walkability_map$hover,
colorscale = "Viridis",
reversescale = T, #to get reverse color scheme
zmin = 1,
zmax = 16,
marker = list(line = list(width = 0)))
choropleth_map = choropleth_map %>%
layout(title = "National Walkability Index in the US")
choropleth_map = choropleth_map %>%
layout(geo = g)
choropleth_map
In our interactive map, the walkability index ranges from 1 to 16, with darker colors representing a higher walkability index and lighter colors representing a lower walkability index. Visually, most of the cities in the US have a relatively lower walkability index while most major metropolitan areas have a higher walkability index.
Compared to the EPA’s
National Walkability Map, our interactive map provides additional
information about the percentage of selective health outcomes. This
interactive display of both walkability scores and various health
outcomes across the US can be helpful for various audiences who might
want to look at the spatial distribution of both simultaneously.
To further understand the National Walkability Index in smaller spatial units, we narrowed the exploration to New York City, where us five walkaholics reside.
The first step is to get a general understanding of the walkability index in the City by creating a descriptive choropleth (discrete density) map on the census tract level. We focused on the following counties:
library(sf)
library(tmap)
library(tidyverse)
library(leaflet)
library(shinyjs)
# Read-in shapefile
choropleth_NYC <- st_read("./data/cluster_analysis/map_with_everything.shp", quiet = TRUE)
# Clean up and filter
choropleth_NYC = choropleth_NYC %>%
janitor::clean_names() %>%
mutate(nat_walk_i = as.numeric(nat_walk_i),
nat_walk_i = round(nat_walk_i, digits = 1)) %>%
filter(stusps == "NY",
namelsadco %in% c("Bronx County", "Kings County", "New York County", "Queens County", "Richmond County")) %>%
rename(median_income = s1903_c03) %>%
mutate(median_income = as.numeric(median_income))
choropleth_NYC = choropleth_NYC[!is.na(choropleth_NYC$nat_walk_i),]
# Time to map
tmap_mode("view")
tm_basemap("CartoDB.Positron") +
tm_shape(choropleth_NYC) +
tm_polygons(col = "nat_walk_i",
style = "fisher",
n = 5,
title = "National Walkability Index",
palette = ("Greens"))
From this choropleth map, we can see that New York City is walkable
overall. However, we want to further identify census tracts that are
statistically significant hot spots, cold spots, and spatial outliers
using clustering analysis.
As mentioned, we want to find NYC census tracts that are statistically significant hot spots, cold spots, and spatial outliers. To start, it is crucial to understand the terms.
To find these clusters, we used local Moran’s I. The local spatial statistic Moran’s I is calculated for each zone based on the spatial weights object used. The values returned include a Z-value, and may be used as a diagnostic tool. The statistic is:
Essentially, the local Moran’s I tells us the local indicator of spatial association or LISA statistic, which is used as a cluster indicator to map different walkability clusters in the NYC.
For local Moran’s I, we also need to specify the spatial weights by defining neighborhood connectivity (what is considered a neighborhood). The goal of this step is to identify nearest census tract around a specific census tract. We used the Queen’s first order contiguity to establish neighborhood connectivity in the following analysis.
If a census tract is within the threshold distance of the selected census tract, a 1 is given, otherwise a 0 is given. Neighborhoods are created based on which observations are judged to be contiguous.
Queen’s first order contiguity suggests that, if a census tract share a vertex or a line segment with the selected census tract, they are given a weight of 1 (adjacent and are treated as a neighbor), else they are given a value 0 (nonadjacent and are not treated as a neighbor).
Visually, it looks like this:
Clearly defining contiguity is needed to conduct cluster analysis since we want to know what is considered as a neighboring census tract.
By having a basic understanding, we can plot the clustering map of walkability in NYC using Queen’s first order contiguity and local Moran’s I.
library(rgeoda)
library(sf)
library(tmap)
library(tidyverse)
library(leaflet)
# Read-in files
cluster_map <- st_read("./data/cluster_analysis/map_with_everything.shp", quiet = TRUE)
# Clean up and filter
cluster_map_clean = cluster_map %>%
janitor::clean_names() %>%
mutate(nat_walk_i = as.numeric(nat_walk_i),
nat_walk_i = round(nat_walk_i, digits = 1)) %>%
filter(stusps == "NY",
namelsadco %in% c("Bronx County", "Kings County", "New York County", "Queens County", "Richmond County")) %>%
rename(sex_male = dp05_0002p,
sex_female = dp05_0003p,
age_18plus = dp05_0021p,
race_white = dp05_0037p,
race_black = dp05_0038p,
race_aian = dp05_0039p,
race_asian = dp05_0044p,
race_2_plus = dp05_0058p,
race_hispanic = dp05_0071p,
median_income = s1903_c03) %>%
mutate(median_income = as.numeric(median_income))
cluster_map_walkability = cluster_map_clean[!is.na(cluster_map_clean$nat_walk_i),]
# Queen 1st order weights matrix
queen_w <- queen_weights(cluster_map_walkability, order = 1)
# Local Moran's I
lisa <- local_moran(queen_w, cluster_map_walkability["nat_walk_i"],
permutations = 9999)
# Get cluster column and join it to shp
cluster_map_walkability$cluster <- as.factor(lisa$GetClusterIndicators())
str(cluster_map_walkability)
# Add labels to clusters
levels(cluster_map_walkability$cluster) <- lisa$GetLabels()
# We want the GeoDa colors
lisa_colors <- lisa_colors(lisa)
# Recode clusters using tidyverse (dyplr)
cluster_map_walkability %>%
mutate(cluster = recode_factor(.x = cluster,
`Undefined` = "Not significant",
`Isolated` = "Not significant")) -> cluster_map_walkability
# Map the clusters with a basemap
tmap_mode("view")
tm_basemap("CartoDB.Positron") +
tm_shape(cluster_map_walkability) +
tm_polygons(col = "cluster",
palette = c("#eeeeee",
"#FF0000",
"#0000FF",
"#a7adf9",
"#f4ada8"),
alpha = .8,
title = "Clusters",
border.col = "black",
border.alpha = .9)
Looking at Manhattan, we can see that there are several High-High clusterings (i.e., census tracts that with walkability that are surrounded by census tracts with a high walkability index) in the Midtown and SOHO area. Some parts of Queens, such as Flushing and Forest Hills, also indicates a High-High clustering.
However, parts of the Bronx shows the Low-Low clustering, where census tracts that are low in walkability are surrounded by census tracts with low walkability index. The Low-Low clustering in Brooklyn, on the other hand, might be due to the location of the JFK airport.
You might be wondering, what is the importance of finding these statistically significant hot spots and cold spots of walkability in the census tract level?
Reflecting back to the primary goal of the project, which is to
highlight the connections between walkability, the built environment,
and community-level health, identifying these clusters can help policy
makers to put there attentions on those Low-Low areas.
The High-High neighborhoods are doing a good job in
building a walkable environment and the Low-Low areas
will benefit a lot by improving built environment infrastructure to make
the community more walkable for all.
You can get more information about local Moran’s I here, and read about R package for analyzing spatial data.
We referenced this page when defining Queen’s first order.